A Statistical Approach to Classify Nationality of Name

نویسندگان

  • Kobkrit Viriyayudhakorn
  • Chativit Prayoonsri
  • Chaklam Silpasuwanchai
  • Cholwich Nattee
  • Thanaruk Theeramunkong
چکیده

Name entities (NEs), especially personal names, are very important components in interpreting some kinds of text documents e.g. news. To extract personal names efficiently, statistical language models are required to denote characteristics of personal names. Among these characteristics, nationality of a name is a useful source for interpreting the text document. Automatically inferencing nationality from a name also directly assists a user to gain more information from the name. In this paper, we therefore propose a statistical approach to identify nationality of names written in Thai. Extracting features from decomposed personal names, their probabilistic bigram and tri-gram models are used with naive Bayesian classification to assign the most proper class for a name. To evaluate the proposed approach, a number of experiments are conducted on real-world data. The experimental results show that our approach works efficiently with about 94% accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A statistical approach to classify Skype traffic

Abstract- Skype is one of the most powerful and high-quality chat tools that allows its users to use of many services such as: transferring audio, sending messages, video conferencing and audio for free. Skype traffic has a lot of Internet traffic. Hence, Internet service providers need to identify traffic to do the quality of service and network management. On the other hand, Skype developers ...

متن کامل

A New Statistical Approach for Recognizing and Classifying Patterns of Control Charts (RESEARCH NOTE)

Control chart pattern (CCP) recognition techniques are widely used to identify the potential process problems in modern industries. Recently, artificial neural network (ANN) –based techniques are very popular to recognize CCPs. However, finding the suitable architecture of an ANN-based CCP recognizer and its training process are time consuming and tedious. In addition, because of the black box ...

متن کامل

Application of Decision on Beliefs for Fault Detection in uni-variate Statistical Process Control

In this research, the decision on belief (DOB) approach was employed to analyze and classify the states of uni-variate quality control systems. The concept of DOB and its application in decision making problems were introduced, and then a methodology for modeling a statistical quality control problem by DOB approach was discussed. For this iterative approach, the belief for a system being out-...

متن کامل

A Model of Iranian EFL Learners\' Cultural Identity: A Structural Equation Modeling Approach

This study aimed, firstly, to investigate the underlying components of Iranian cultural identity and, secondly, to confirm the aforementioned components via Structural Equation Modeling (SEM) analysis. In order to achieve these goals, the researchers reviewed the extensive local and international literature on language, culture and identity. Based on the literature and consultations with a grou...

متن کامل

The Place-Name as an Intangible Place of Memory (A Holistic Approach in Reading the Place-Names through a Comparative-Analytical Study on the Character of Name and Place)

Understanding architectural heritage and their various aspects have always been a subject of focus for the international conservation communities. Within the recent decades, eventhough the place-names are part of the living history as well as cultural heritage, they have still constantly been facing quick precipitant changes. As such, in the Conservation literature, most studies have skipped ad...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007